Processor Accessing A Scratch Pad On-Demand To Reduce Power Consumption

ABSTRACT

The present invention provides processing systems, apparatuses, and methods that access a scratch pad on-demand to reduce power consumption. In an embodiment, an instruction fetch unit initiates an instruction fetch. When a scratch pad is enabled, an instruction is retrieved from the scratch pad in parallel with a translation of a virtual address to a physical address. If the physical address is associated with the scratch pad, the retrieved instruction is provided to an execution unit. Otherwise, the scratch pad is disabled to reduce power consumption and the instruction fetch is re-initiated. When the scratch pad is disabled, an instruction is retrieved from another instruction source, such as an instruction cache, in parallel with the translation of the virtual address to the physical address. If the physical address is associated with the scratch pad, the scratch pad is enabled and the instruction fetch is re-initiated.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. application Ser.No. 11/272,737, filed on Nov. 15, 2005, entitled “Processor Accessing aScratch Pad On-Demand to Reduce Power Consumption,” now allowed, whichis incorporated herein by reference in its entirety. This application isalso related to commonly owned, co-pending U.S. application Ser. No.11/272,718, filed on Nov. 15, 2005, entitled “Processor Utilizing A LoopBuffer To Reduce Power Consumption,” and commonly owned, co-pending U.S.application Ser. No. 11/272,719, filed on Nov. 15, 2005, entitled“Microprocessor Having A Power-Saving Instruction Cache Way PredictorAnd Instruction Replacement Scheme,” each of which is incorporatedherein by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates generally to microprocessors and reducingpower consumption in microprocessors.

BACKGROUND OF THE INVENTION

An instruction fetch unit of a microprocessor is responsible forcontinually providing the next appropriate instruction to an executionunit of the microprocessor. Generally, an instruction fetch unitcomputes a virtual address for the next instruction to be fetched,translates the virtual address to a physical address, retrieves aninstruction corresponding to the physical address, and provides theinstruction to the execution unit. When multiple instruction sourcessuch as an instruction cache and scratch pad are available, theinstruction fetch unit may not be able to determine which instructionsource to use to retrieve the desired instruction until the virtualaddress is translated into a physical address. Rather than waiting forthe virtual address to be translated, a conventional instruction fetchunit may access all of the instruction sources simultaneously while theaddress is translated. After the address translation is completed, aconventional instruction fetch unit will inspect the retrievedinstructions to determine if the desired instruction was retrieved byone of the instruction sources. If none of the instruction sources hasretrieved the desired instruction, a conventional instruction fetch unituses the translated address to target the appropriate instruction sourceto retrieve the desired instruction.

Although, accessing all the instruction sources simultaneously mayreduce the time required to retrieve an instruction, it unnecessarilyconsumes a significant amount of the total power of a microprocessor.This makes microprocessors having conventional fetch units undesirableand/or impractical for many applications.

What is needed is a microprocessor that can access a variety ofinstruction sources while consuming less power than a microprocessorhaving a conventional fetch unit.

BRIEF SUMMARY OF THE INVENTION

The present invention provides processing systems, apparatuses, andmethods for accessing a scratch pad on-demand to reduce powerconsumption.

In one embodiment, an instruction fetch unit of a processor isconfigured to provide instructions from several instruction sources suchas an instruction cache and a scratch pad to an execution unit of theprocessor. When the scratch pad is enabled, the scratch pad is accessedto retrieve an instruction based on the virtual address. In parallelwith the scratch pad access, the MMU is accessed to translate thevirtual address into a physical address. If the physical address isassociated with the scratch pad, the instruction retrieved from thescratch pad is provided to the execution unit of the processor forexecution. If the physical address is not associated with the scratchpad, the scratch pad is disabled to reduce power consumption and theinstruction fetch unit re-initiates the instruction fetch so that theinstruction can be retrieved from an instruction source other than thescratch pad.

In one embodiment, when the scratch pad is not enabled, anotherinstruction source, such as the instruction cache, is accessed toretrieve an instruction based on the virtual address. In parallel withthe instruction retrieval, the MMU is accessed to translate the virtualaddress into a physical address. If the physical address is associatedwith the scratch pad, the scratch pad is enabled and the instructionfetch unit re-initiates the instruction fetch so that the instructioncan be retrieved from the scratch pad. In one embodiment, if thephysical address is not associated with the scratch pad, the instructionretrieved from the other instruction source is provided to the executionunit of the processor for execution.

In one embodiment, another instruction source, such as the instructioncache, is disabled to reduce power consumption when the scratch pad isenabled and the instruction source is enabled when the scratch pad isdisabled.

In one embodiment, components of a processor, such as the instructioncache and the scratch pad are disabled to reduce power consumption bycontrolling the clock signal that is delivered to the component. Bymaintaining the input clock signal at either a constant high or aconstant low value, state registers in the component are suspended fromlatching new values and the logic blocks between the state registers areplaced in a stable state. Once the components are placed in a stablestate, the transistors in the state registers and the logic blocks aresuspended from changing states and therefore do not consume powerrequired to transition states.

In one embodiment, when a component is disabled to reduce powerconsumption, a bias voltage is applied to the component to furtherreduce power consumption resulting from leakage. Further embodiments,features, and advantages of the present invention, as well as thestructure and operation of the various embodiments of the presentinvention, are described in detail below with reference to theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

The accompanying drawings, which are incorporated herein and form a partof the specification, illustrate the present invention and, togetherwith the description, further serve to explain the principles of theinvention and to enable a person skilled in the pertinent art to makeand use the invention.

FIG. 1 is a diagram of a processor according to an embodiment of thepresent invention.

FIG. 2 is a more detailed diagram of the processor of FIG. 1.

FIG. 3 is a flow chart illustrating the steps of a method embodiment ofthe present invention.

The present invention will be described with reference to theaccompanying drawings. The drawing in which an element first appears istypically indicated by the leftmost digit(s) in the correspondingreference number.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides processing systems, apparatuses, andmethods for accessing a scratch pad on-demand to reduce powerconsumption. In the detailed description of the invention that follows,references to “one embodiment”, “an embodiment”, “an exampleembodiment”, etc., indicate that the embodiment described may include aparticular feature, structure, or characteristic, but every embodimentmay not necessarily include the particular feature, structure, orcharacteristic. Moreover, such phrases are not necessarily referring tothe same embodiment. Further, when a particular feature, structure, orcharacteristic is described in connection with an embodiment, it issubmitted that it is within the knowledge of one skilled in the art toeffect such feature, structure, or characteristic in connection withother embodiments whether or not explicitly described.

FIG. 1 is a diagram of a processor 100 according to an embodiment of thepresent invention. Processor 100 includes a processor core 110, aninstruction cache 102, and a scratch pad 104. Processor core 110includes an instruction fetch unit 120 and an execution unit 106.Processor 100 may access an external memory 108. Instructions retrievedfrom external memory 108 can be cached in instruction cache 102.Instruction fetch unit 120 interfaces with instruction cache 102,scratch pad 104, execution unit 106, and memory 108 through buses 112,114, 116, and 118, respectively. As would be appreciated by thoseskilled in the relevant arts, instruction sources such as instructioncache 102 and scratch pad 104 may also be placed within processor core110, within instruction fetch unit 120, or external to processor 100.Memory 108 may be, for example, a level two cache, a main memory, aread-only memory (ROM) or another storage device that is capable ofstoring instructions.

FIG. 2 is a more detailed diagram of processor 100 according to oneembodiment of the present invention. As shown in FIG. 2, instructionfetch unit 120 includes a fetch controller 200, a multiplexer 208, acomparator 210, and an address register 220. Fetch controller 200interfaces with multiplexer 208, scratch pad 104, instruction cache 102,and execution unit 106 through buses 218, 214, 212, and 216,respectively. Buses 204, 214, and 222 represent components of bus 114.Buses 202, 212, and 222 represent components of bus 112. Buses 206 and216 represent components of bus 116.

Register 220 stores a virtual address of an instruction to be fetched.Fetch controller 200 updates register 220 via bus 226 with the addressof the instruction to be fetched. The virtual address stored in register220 is made available to instruction cache 102, scratch pad 104, and amemory management unit (MMU) 224 through bus 222.

Memory management unit (MMU) 224 translates a virtual address providedfrom register 220 to a physical address. In one embodiment, MMU 224 isimplemented, for example, using a translation lookaside buffer (TLB).

MMU 224 may be placed within processor 100, within processor core 110,or within instruction fetch unit 120.

An address, such as the virtual address stored in register 220, includesa tag and an offset. The tag refers to a certain number of the mostsignificant bits in an address. The offset refers to the remaining bitsin the address.

During address translation, only the bits in the tag of a virtualaddress are translated to generate a physical address. Hence, a virtualaddress and its corresponding physical address share the same bits forthe offset. Since the bits in the offset of the physical address can beextracted from the virtual address prior to address translation, aninstruction source such as scratch pad 104 and instruction cache 102 maybe configured to guess and retrieve an instruction based solely on theoffset of the virtual address.

When an instruction source is configured to retrieve an instructionbased on the offset of the virtual address, the instruction source willprovide an instruction as well as a tag of the physical address of theinstruction. After the virtual address is translated, the tag of theinstruction can be compared with the tag of the translated address todetermine if the correct instruction was actually retrieved. If theguess was wrong, the instruction source can use the now known translatedaddress to retrieve the correct instruction.

Scratch pad 104 is a memory preferably configured to provideinstructions having a physical address with a tag specified in register226. Hence, scratch pad 104 provides instructions for a singlecontinuous range of physical addresses. The size of the range is thenumber of instructions that can be uniquely identified by the bits ofthe offset. Scratch pad 104 may be enabled and disabled. When disabled,scratch pad 104 reduces power consumption. When enabled, scratch pad 104retrieves an instruction based on the offset of the virtual addressstored in register 220 in parallel with the address translationperformed by MMU 224. Once the translation is completed by MMU 224, thetag in register 226 can be compared with the tag of the translatedaddress to determine if the instruction retrieved by scratch pad 104corresponds to the virtual address stored in register 220. Scratch pad104 provides a retrieved instruction on bus 204. In one embodiment,scratch pad 104 may be configured to provide instructions from two ormore continuous ranges of physical addresses. In such an embodiment, aseparate tag register is provided to specify each range and the tagsstored in each tag register are compared with the tag of the addresstranslated by MMU 224 to determine if the virtual address stored inregister 220 corresponds to one of the continuous ranges of physicaladdresses associated with scratch pad 104.

Register 226 may be implemented, for example, as part of scratch pad 104or as part of instruction fetch unit 120. When register 226 isimplemented as part of scratch pad 104, the tag stored in register 226is made available to comparator 210 even when scratch pad 104 isdisabled. In one embodiment, the tag in register 226 may be changedprogrammatically.

When enabled, instruction cache 102 provides instructions not providedby scratch pad 104. Instruction cache 102 may be enabled and disabled.When disabled, instruction cache 102 reduces power consumption. Whenenabled, instruction cache 102 retrieves an instruction using the offsetof the virtual address stored in register 220. In addition, instructioncache 102 retrieves a tag of the physical address associated with theinstruction. The retrieval of the instruction is performed in parallelwith the address translation performed by MMU 224. After MMU 224completes the translation, the instruction's tag is compared with thetag of the translated address to determine if the retrieved instructioncorresponds to the virtual address stored in register 220. Instructioncache 102 provides a retrieved instruction on bus 202.

Instruction cache 102 may be implemented, for example, as a directmapped or a set-associated cache. When the instruction cache isimplemented as a set-associated cache, one or more bits in the offset ofthe virtual address stored in register 220 may be used as an index toselect a set (or a way).

Comparator 210 determines whether the virtual address stored in register220 corresponds to an instruction provided by scratch pad 104. The tagstored in register 226 is provided to comparator 210 on bus 230. AfterMMU 224 translates the virtual address stored in register 220, MMU 224provides the tag of the translated address to comparator 210 on bus 228.Comparator 210 compares the two tags to determine if they match. If theymatch, then the virtual address stored in register 220 corresponds to aninstruction provided by scratch pad 104. The result of comparator 210 isprovided to fetch controller 200 on bus 232. Based on the result of thecomparison, fetch controller 200 causes multiplexer 208 to selectbetween an instruction provided by scratch pad 104 on bus 204 and aninstruction provided by instruction cache 102 on bus 202.

Because fetch controller 200 does not know whether the virtual addressstored in register 220 corresponds with an instruction associated withscratch pad 104 or instruction cache 102 until after MMU 224 translatesthe virtual address, fetch controller 200 can access both scratch pad104 and instruction cache 102 to retrieve instructions simultaneously.Once fetch controller 200 determines which instruction source shouldprovide the instruction, fetch controller 200 can discard anyincorrectly retrieved instructions. Although accessing scratch pad 104and instruction cache 102 at the same time minimizes delay time, havingboth scratch pad 104 and instruction cache 102 enabled for everyinstruction fetch consumes a significant amount of the total power ofprocessor 100.

Instructions of a program tend to exhibit spatial and temporal locality,thus scratch pad 104 and instruction cache 102 is each likely to beutilized to provide a sequence of instructions at a time. The presentinvention, as described herein, takes advantage of this observation inembodiments by enabling only one of scratch pad 104 or instruction cache102 at any time. If scratch pad 104 is enabled to retrieve instructionsand fetch controller 200 later determines, after the address translationby MMU 224, that the instruction should be retrieved from instructioncache 102, scratch pad 104 is disabled to reduce power consumption andthe instruction fetch is re-started with instruction cache 102 enabled.Similarly, if instruction cache 102 is enabled to retrieve instructionsand fetch controller 200 later determines during the course of theinstruction fetch that the instruction should be provided by scratch pad104, instruction cache 102 is disabled to reduce power consumption andthe instruction fetch is re-started with scratch pad 104 enabled.

For programs that tend to retrieve instructions from scratch pad 104 andinstruction cache 102 in bursts, enabling and disabling scratch pad 104and instruction cache 102 will have minimal performance degradationsince the amount of time spent to enable and disable scratch pad 104 andinstruction cache 102 will be small compared to the amount of time spentproviding instructions from scratch pad 104 and instruction cache 102.By disabling scratch pad 104 and instruction cache 102 in the mannerdescribed above, power savings are achieved.

Although the present invention attempts to disable scratch pad 104 whenit is not providing instructions, scratch pad 104 is not disabled if itis performing another function. For example, if instructions are beingstored into scratch pad 104, scratch pad 104 will not be disabled untilafter the instructions are stored in scratch pad 104. Likewise, ifinstruction cache 102 is performing another function, instruction cache102 will not be disabled until it has completed the finction.

FIG. 3 depicts a flow chart illustrating the steps of a method 300according to an embodiment of the present invention. Method 300 is usedto retrieve instructions by a processor having access to a scratch padand an instruction cache. While method 300 can be implemented, forexample, using a processor according to the present invention, such asprocessor 100 illustrated in FIGS. 1-2, it is not limited to beingimplemented by processor 100. Method 300 begins with step 302.

In step 302, a virtual address of an instruction to be fetched andprovided to an execution unit of a processor is determined. The virtualaddress may correspond, for example, to an instruction that can beprovided by a scratch pad or an instruction cache of a processor. In oneembodiment, an instruction fetch unit of the processor determines thevirtual address of an instruction to be fetched by incrementing thevirtual address of the previously fetched instruction or by using thetarget address of a jump or a branch instruction that was previouslyexecuted.

In step 304, the virtual address determined in step 302 is translated togenerate a physical address. In parallel with the address translation,the instruction cache provides an instruction based on the virtualaddress. In one embodiment, a memory management unit performs theaddress translation.

In step 306, the physical address generated in step 304 is examined todetermine if it is associated with an instruction that is provided by ascratch pad. For example, if the scratch pad provides instructions for arange of physical addresses associated with a single tag, the tag iscompared with the tag of the physical address generated in step 304 todetermine if they match. If the tags match, the physical addressgenerated in step 304 is associated with the scratch pad.

If the physical address is associated with the scratch pad, method 300proceeds to step 308. Otherwise, method 300 proceeds to step 328.

In step 308, the scratch pad is enabled unless it is already enabled.The scratch pad may already be enabled, for example, to storeinstructions into the scratch pad.

In step 310, the instruction cache is disabled to reduce powerconsumption. Control proceeds to step 312.

In step 312, the fetch for an instruction corresponding to the virtualaddress determined in step 302 is re-performed. Since the scratch padwas enabled in step 308, the scratch pad retrieves an instruction basedon the virtual address determined in step 302.

In step 314, the instruction retrieved from the scratch pad is providedto an execution unit of the processor for execution. Control proceeds tostep 316.

In step 316, a virtual address of an instruction to be fetched andprovided to the execution unit of the processor is determined, as instep 302. Control proceeds to step 318.

In step 318, the virtual address determined in step 316 is translated togenerate a physical address. In parallel with the address translation,the scratch pad retrieves an instruction based on the virtual address.

In step 320, the physical address generated in step 318 is examined todetermine if it is associated with an instruction that is provided bythe scratch pad. If the physical address is associated with the scratchpad, method 300 proceeds to step 314. Otherwise, method 300 proceeds tostep 322.

In step 322, the scratch pad is disabled to reduce power consumptionunless the scratch pad must remain enabled for another purpose. Forexample, if instructions are being stored in the scratch pad, thescratch pad will be disabled at a later time when instructions are nolonger being stored in the scratch pad.

In step 324, the instruction cache is enabled. Control proceeds to step326.

In step 326, the fetch for an instruction corresponding to the virtualaddress determined in step 316 is re-performed. Since the instructioncache was enabled in step 324, the instruction cache retrieves aninstruction based on the virtual address determined in step 316.

In step 328, if the physical address of the instruction retrieved fromthe instruction cache corresponds to the virtual address of theinstruction to be fetched, the instruction retrieved from theinstruction cache is provided to the execution unit of the processor forexecution. Otherwise, the instruction cache utilizes the physicaladdress that was generated by translating the virtual address toretrieve and provide the correct instruction to the execution unit. Theinstruction cache, for example, may retrieve the correct instructionfrom an external memory. After step 328, method 300 proceeds to step302.

As described herein, a component of a processor such as an instructioncache, a scratch pad, etc. may be disabled to reduce power consumptionin accordance with the present invention by controlling the input clocksignal of the component. By controlling the input clock signal so thatthe clock is maintained at a constant high or a constant low value,state registers in the component are suspended from latching new values.As a result, logic blocks between the state registers are kept in astable state and the transistors in the logic blocks are suspended fromchanging states. Hence, when the input clock signal is controlled, thetransistors in the state registers and logic blocks of the component aresuspended from changing states and therefore no power is required tochange states. Only the power required to maintain a stable state isconsumed. In one embodiment, when a component is disabled to reducepower consumption, a bias voltage is applied to the component tofuirther reduce power consumption arising from leakage.

While various embodiments of the present invention have been describedabove, it should be understood that they have been presented by way ofexample, and not limitation. It will be apparent to persons skilled inthe relevant computer arts that various changes in form and detail canbe made therein without departing from the spirit and scope of theinvention. Furthermore, it should be appreciated that the detaileddescription of the present invention provided herein, and not thesummary and abstract sections, is intended to be used to interpret theclaims. The summary and abstract sections may set forth one or more butnot all exemplary embodiments of the present invention as contemplatedby the inventors.

For example, in addition to implementations using hardware (e.g., withinor coupled to a Central Processing Unit (“CPU”), microprocessor,microcontroller, digital signal processor, processor core, System onChip (“SOC”), or any other programmable or electronic device),implementations may also be embodied in software (e.g., computerreadable code, program code, instructions and/or data disposed in anyform, such as source, object or machine language) disposed, for example,in a computer usable (e.g., readable) medium configured to store thesoftware. Such software can enable, for example, the function,fabrication, modeling, simulation, description, and/or testing of theapparatus and methods described herein. For example, this can beaccomplished through the use of general programming languages (e.g., C,C++), GDSII databases, hardware description languages (HDL) includingVerilog HDL, VHDL, SystemC, SystemC Register Transfer Level (RTL), andso on, or other available programs, databases, and/or circuit (i.e.,schematic) capture tools. Such software can be disposed in any knowncomputer usable medium including semiconductor, magnetic disk, opticaldisk (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signalembodied in a computer usable (e.g., readable) transmission medium(e.g., carrier wave or any other medium including digital, optical, oranalog-based medium). As such, the software can be transmitted overcommunication networks including the Internet and intranets.

It is understood that the apparatus and method embodiments describedherein may be included in a semiconductor intellectual property core,such as a microprocessor core (e.g., embodied in HDL) and transformed tohardware in the production of integrated circuits. Additionally, theapparatus and methods described herein may be embodied as a combinationof hardware and software. Thus, the present invention should not belimited by any of the above-described exemplary embodiments, but shouldbe defined only in accordance with the following claims and theirequivalence.

1 A system comprising: a processor having a processor core, a fetch unitand a register for storing a portion of an address for an instruction tobe fetched; a first memory source for storing instructions and a scratchpad memory for storing instructions both of which couple to theprocessor by a bus, wherein the instruction is made available to theprocessor from the scratch pad memory when the portion of an addressstored in the register matches a translated virtual instruction addressprovided by the fetch unit.
 2. The system of claim 1 wherein the firstmemory source is an instruction cache.
 3. The system of claim 1 whereinthe first memory source is a level one instruction cache.
 4. The systemof claim 1 wherein the first memory source is a level two cache.
 5. Thesystem of claim 1 wherein the first memory source is disabled to reducepower consumption.
 6. The system of claim 1 wherein the scratch padmemory is disabled to reduce power consumption if the instruction is notmade available by the scratch pad memory.
 7. The system of claim 1wherein the first memory source is enabled and the scratch pad memory isdisabled to reduce power consumption if the instruction is not madeavailable by the scratch pad memory.
 8. The system of claim 1 whereinthe portion of an address for an instruction comprise a tag of thetranslated physical address of the instruction.
 9. The system of claim 8wherein the instruction is selected from scratch pad memory based on theoffset of the virtual instruction address.
 10. A method of performing aninstruction fetch associated with a virtual address in a processorhaving a scratch pad memory for storing instructions and a first memorysystem for storing instructions, comprising: making the instructionavailable to the processor from the scratch pad memory when the portionof an address stored in a register matches a translated virtualinstruction address provided by a fetch unit.
 11. The method of claim 11wherein the first memory source is an instruction cache.
 12. The methodof claim 10 wherein the first memory source is a level one instructioncache.
 13. The method of claim 10 wherein the first memory source is alevel two cache.
 14. The method of claim 10 wherein the first memorysource is disabled to reduce power consumption.
 15. The method of claim10 wherein the scratch pad memory is disabled to reduce powerconsumption if the instruction is not made available by the scratch padmemory.
 16. The method of claim 10 wherein the first memory source isenabled and the scratch pad memory is disabled to reduce powerconsumption if the instruction is not made available by the scratch padmemory.
 17. The method of claim 10 wherein the portion of the addressfor an instruction comprises a tag of the translated physical address ofthe instruction.
 18. The method of claim 10 wherein the instruction isselected from scratch pad memory based on the offset of the virtualinstruction address.
 19. A computer program product for use with acomputing device, the computer program product comprising: a tangiblecomputer usable medium, having computer readable program code embodiedthereon for providing a processor, the computer readable program codecomprising: first computer readable program code for providing a fetchunit, second computer readable program code for providing a register forstoring a portion of an address for an instruction to be fetched,coupled to the fetch unit, third computer readable program code forproviding a first memory source for storing instructions, coupled to thefetch unit, and fourth computer readable program code for providing ascratch pad memory for storing instructions, coupled to the fetch unit,wherein the instruction is made available to the processor from thescratch pad memory when the portion of an address stored in the registermatches a translated virtual instruction address provided by the fetchunit.
 20. The computer program product of claim 19, wherein the scratchpad memory is disabled to reduce power consumption if the instruction isnot made available by the scratch pad memory.